{
  "cells": [
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/chatbots/nemo-guardrails/03-rag-with-actions.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/chatbots/nemo-guardrails/03-rag-with-actions.ipynb)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "u6m1lOaUcKl5"
      },
      "source": [
        "# Retrieval Augmented Generation (RAG) with Actions\n",
        "\n",
        "Using actions in NeMo Guardrails gives us the ability to do a lot of things without needing heavy agent decision making pipelines. We can enable to use of tools with little more than what is essentially a \"fuzzy logic match\" between a users input and the intent groups that we define in our Colang files.\n",
        "\n",
        "In this example, we'll be taking a look at how to apply the power of **R**etrieval **A**ugmented **G**eneration (RAG) to LLMs using nothing more than Colang logic.\n",
        "\n",
        "We'll get started by installing the prerequisite libraries:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "pC-PQoCxcKl6",
        "outputId": "a3236f70-4751-40ce-bb42-1f6b1e3dea73"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m13.9/13.9 MB\u001b[0m \u001b[31m15.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m179.1/179.1 kB\u001b[0m \u001b[31m18.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m519.1/519.1 kB\u001b[0m \u001b[31m46.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m73.6/73.6 kB\u001b[0m \u001b[31m8.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m1.0/1.0 MB\u001b[0m \u001b[31m57.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m1.4/1.4 MB\u001b[0m \u001b[31m63.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m62.6/62.6 kB\u001b[0m \u001b[31m7.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m1.2/1.2 MB\u001b[0m \u001b[31m69.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m647.5/647.5 kB\u001b[0m \u001b[31m56.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m86.0/86.0 kB\u001b[0m \u001b[31m7.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m57.1/57.1 kB\u001b[0m \u001b[31m5.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m67.0/67.0 kB\u001b[0m \u001b[31m7.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m5.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m71.5/71.5 kB\u001b[0m \u001b[31m6.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m60.0/60.0 kB\u001b[0m \u001b[31m7.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m300.3/300.3 kB\u001b[0m \u001b[31m28.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m115.3/115.3 kB\u001b[0m \u001b[31m14.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m194.1/194.1 kB\u001b[0m \u001b[31m19.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m134.8/134.8 kB\u001b[0m \u001b[31m16.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m268.8/268.8 kB\u001b[0m \u001b[31m26.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m69.6/69.6 kB\u001b[0m \u001b[31m9.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m90.0/90.0 kB\u001b[0m \u001b[31m10.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m7.4/7.4 MB\u001b[0m \u001b[31m80.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m71.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m6.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m49.4/49.4 kB\u001b[0m \u001b[31m6.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m7.8/7.8 MB\u001b[0m \u001b[31m104.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m72.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Building wheel for annoy (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Building wheel for sentence-transformers (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
            "ipython 7.34.0 requires jedi>=0.16, which is not installed.\n",
            "cvxpy 1.3.2 requires setuptools>65.5.1, but you have setuptools 65.5.1 which is incompatible.\n",
            "google-colab 1.0.0 requires requests==2.27.1, but you have requests 2.31.0 which is incompatible.\u001b[0m\u001b[31m\n",
            "\u001b[0m"
          ]
        }
      ],
      "source": [
        "!pip install -qU \\\n",
        "    nemoguardrails==0.4.0 \\\n",
        "    pinecone-client==2.2.2 \\\n",
        "    datasets==2.14.3 \\\n",
        "    openai==0.27.8"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Knowledge Base Download"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "L0709sN9cKl8"
      },
      "source": [
        "To begin, we need to setup our data and retrieval components for RAG. We'll start with a dataset that contains info on the recent Llama 2 models:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 264,
          "referenced_widgets": [
            "15e370c281f5451d90803f44a18b5df3",
            "3f440a391336415186b1061a3b664bca",
            "3dd53f401a0c45a6b239a6bb3de4ae8c",
            "9ebb0b2d61b443fcb02d4c442242a633",
            "1cb150c3faa6443fb2cb33de7b323b04",
            "1940183a0f3f44e2a7b977ffdc409d97",
            "f38778d2645b4325b772a0f3b35006a9",
            "20f24b406fef48148954e30c760a2585",
            "5b09bc28459d4f7da410fd489f822541",
            "0c839559ac8140968888a690475fbb9d",
            "866651d572d346bc87a58b3532d08789",
            "0a67007519b94e9f889dfb812e8aed57",
            "978ee6c4061842868b523768fdef761f",
            "c55fe51f5fa34b40b5838151f4bd8f80",
            "8765d2a63da649f391600d85be4ffa3f",
            "eb420851e4ab4ec8bcd6ffe029334f3d",
            "156b6881e2ab4aa9b36dad3425515a8c",
            "5f6719dce40249ae9d7be18dd945e142",
            "da0c131629e74c1abaaf302dba2501db",
            "37dd8ba79ea3497f8d93a975ee2e31f7",
            "86d6d86419ed4575a219a83022c6f2d8",
            "47a73febe8344f28990b7ef0a48e0fdf",
            "c2f722a0dabf4423b9dab3650654eb3d",
            "04963424351e4e189fd1accff0761c13",
            "e3be547b6cf74014ba896988331d58be",
            "23481861d34040eaad6a44afb220abd5",
            "0466c1a6801c4523898e736d7fd840e4",
            "43333bd25ea64ab0a69e5c6523bb4f12",
            "1ce6d260cfbc42a3819b5eea477454c6",
            "c049b4a572c2401ca8f5ec1e97ce7eec",
            "2ff85b2c3d99438f95741225842eeee1",
            "4d7768dee55b4a5d9a6c508145773d4c",
            "40e395600ca6480ea0faf3e43b6a97a0",
            "d91d537e023a4d05b92e2ac816decb56",
            "2ef4863c67144f73957265f05f45f4b0",
            "b87feaecd2dd429eab247ef11526b83a",
            "476baf88f01144e0b440507f440de9fb",
            "f952a2d871ff42dc9e573b73ae6bf7a2",
            "a2402336caca46318d374954ba896626",
            "fd3afd88485c4bb2a92b0171d58df088",
            "c5caf402e5a04706a679957eb4f6db45",
            "7f68415905974927b5dabf7afe1a212c",
            "288b2f021a17414aae208194f49fa1bb",
            "e327d29d8754472e8842b6c180e7e3ec",
            "c0226a557e4f4db28b2aef5bb1957d1f",
            "f995c312c6004029bf9bcd6ef1e8c968",
            "64dc7303faf442dc9e11ab2b9c32ef27",
            "52fc99d752d74084a13a800fa4a32c5c",
            "60ada0e7c07d4296a851f7e7502d7076",
            "74507837428a4c73a7f7cc0c14eb5943",
            "f34ac26c9268462ca8a160ab399eb3d8",
            "e9f1b5eea416421f90bedd998ebd3d31",
            "a992efa50d014755a5bf9e0ca94aecba",
            "27dccef87f8a49ce9e477ee3fea03aac",
            "dc7a4ff1e33545688788d4e84382f3da"
          ]
        },
        "id": "rDo76dejcKl8",
        "outputId": "04638843-b458-406a-9ead-b896ec372358"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "15e370c281f5451d90803f44a18b5df3",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "Downloading readme:   0%|          | 0.00/409 [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "0a67007519b94e9f889dfb812e8aed57",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "c2f722a0dabf4423b9dab3650654eb3d",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "Downloading data:   0%|          | 0.00/14.4M [00:00<?, ?B/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "d91d537e023a4d05b92e2ac816decb56",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "c0226a557e4f4db28b2aef5bb1957d1f",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "Generating train split: 0 examples [00:00, ? examples/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/plain": [
              "Dataset({\n",
              "    features: ['doi', 'chunk-id', 'chunk', 'id', 'title', 'summary', 'source', 'authors', 'categories', 'comment', 'journal_ref', 'primary_category', 'published', 'updated', 'references'],\n",
              "    num_rows: 4838\n",
              "})"
            ]
          },
          "execution_count": 7,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "from datasets import load_dataset\n",
        "\n",
        "data = load_dataset(\n",
        "    \"jamescalam/llama-2-arxiv-papers-chunked\",\n",
        "    split=\"train\"\n",
        ")\n",
        "data"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "LuLjiD5ycKl8",
        "outputId": "4837a2fe-c4b7-4f34-9410-1722139c2ba7"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'doi': '1102.0183',\n",
              " 'chunk-id': '0',\n",
              " 'chunk': 'High-Performance Neural Networks\\nfor Visual Object Classi\\x0ccation\\nDan C. Cire\\x18 san, Ueli Meier, Jonathan Masci,\\nLuca M. Gambardella and J\\x7f urgen Schmidhuber\\nTechnical Report No. IDSIA-01-11\\nJanuary 2011\\nIDSIA / USI-SUPSI\\nDalle Molle Institute for Arti\\x0ccial Intelligence\\nGalleria 2, 6928 Manno, Switzerland\\nIDSIA is a joint institute of both University of Lugano (USI) and University of Applied Sciences of Southern Switzerland (SUPSI),\\nand was founded in 1988 by the Dalle Molle Foundation which promoted quality of life.\\nThis work was partially supported by the Swiss Commission for Technology and Innovation (CTI), Project n. 9688.1 IFF:\\nIntelligent Fill in Form.arXiv:1102.0183v1  [cs.AI]  1 Feb 2011\\nTechnical Report No. IDSIA-01-11 1\\nHigh-Performance Neural Networks\\nfor Visual Object Classi\\x0ccation\\nDan C. Cire\\x18 san, Ueli Meier, Jonathan Masci,\\nLuca M. Gambardella and J\\x7f urgen Schmidhuber\\nJanuary 2011\\nAbstract\\nWe present a fast, fully parameterizable GPU implementation of Convolutional Neural\\nNetwork variants. Our feature extractors are neither carefully designed nor pre-wired, but',\n",
              " 'id': '1102.0183',\n",
              " 'title': 'High-Performance Neural Networks for Visual Object Classification',\n",
              " 'summary': 'We present a fast, fully parameterizable GPU implementation of Convolutional\\nNeural Network variants. Our feature extractors are neither carefully designed\\nnor pre-wired, but rather learned in a supervised way. Our deep hierarchical\\narchitectures achieve the best published results on benchmarks for object\\nclassification (NORB, CIFAR10) and handwritten digit recognition (MNIST), with\\nerror rates of 2.53%, 19.51%, 0.35%, respectively. Deep nets trained by simple\\nback-propagation perform better than more shallow ones. Learning is\\nsurprisingly rapid. NORB is completely trained within five epochs. Test error\\nrates on MNIST drop to 2.42%, 0.97% and 0.48% after 1, 3 and 17 epochs,\\nrespectively.',\n",
              " 'source': 'http://arxiv.org/pdf/1102.0183',\n",
              " 'authors': ['Dan C. Cire\u015fan',\n",
              "  'Ueli Meier',\n",
              "  'Jonathan Masci',\n",
              "  'Luca M. Gambardella',\n",
              "  'J\u00fcrgen Schmidhuber'],\n",
              " 'categories': ['cs.AI', 'cs.NE'],\n",
              " 'comment': '12 pages, 2 figures, 5 tables',\n",
              " 'journal_ref': None,\n",
              " 'primary_category': 'cs.AI',\n",
              " 'published': '20110201',\n",
              " 'updated': '20110201',\n",
              " 'references': []}"
            ]
          },
          "execution_count": 8,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "data[0]"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "ohXiH7DxcKl8"
      },
      "source": [
        "We mainly want the information contained within the `chunk` parameter, although we can pull in other bits of data as metadata for use later. We'll also create a new unique ID for each record by concatenating the `doi` and `chunk-id` fields."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 136,
          "referenced_widgets": [
            "7090556da17f4a45a881deff320a301f",
            "d4749795649442b29a06d1d43964aa7f",
            "29e6f822b0f94e5eb70f236a648f2518",
            "57f0df54e956480c88bef41f27a3253c",
            "058ca681ca89482aad52e7bac45f4799",
            "2edcf48f48d54eb7817fb3cbf04104d7",
            "32e69f8655a442e9bd66cef247ef53f0",
            "34bc9aaaca26469a800e4b887aead284",
            "8d5bbcc74d4c463db6def4e25dd9e5ac",
            "05cea30bb8944ab99fd226a6b244a121",
            "fbb4f2a3250d430d86991cbe2691fb23"
          ]
        },
        "id": "wzF9e3SVcKl8",
        "outputId": "a7741908-f739-4ffe-90ff-02ea9b7ca24f"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "7090556da17f4a45a881deff320a301f",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "Map:   0%|          | 0/4838 [00:00<?, ? examples/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/plain": [
              "Dataset({\n",
              "    features: ['doi', 'chunk-id', 'chunk', 'id', 'title', 'summary', 'source', 'authors', 'categories', 'comment', 'journal_ref', 'primary_category', 'published', 'updated', 'references', 'uid'],\n",
              "    num_rows: 4838\n",
              "})"
            ]
          },
          "execution_count": 9,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "data = data.map(lambda x: {\n",
        "    'uid': f\"{x['doi']}-{x['chunk-id']}\"\n",
        "})\n",
        "data"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "id": "5Zun9xprcKl8"
      },
      "outputs": [],
      "source": [
        "data = data.to_pandas()\n",
        "# drop irrelevant fields\n",
        "data = data[['uid', 'chunk', 'title', 'source']]"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "0zp3hhNBcKl9"
      },
      "source": [
        "`chunk` will be the text that we encode and store inside Pinecone. To encode that text data we need to use an embedding model, for that we can use open source [sentence transformers](https://github.com/UKPLab/sentence-transformers), Cohere, OpenAI, and many other services. In this example we will use OpenAI, to do so we will need an [OpenAI API key](https://platform.openai.com) (there will be some minor embedding cost incurred here)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {
        "id": "AuUszBFzcKl9"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "\n",
        "os.environ[\"OPENAI_API_KEY\"] = os.environ.get(\"OPENAI_API_KEY\") or \"YOUR_API_KEY\""
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "mt0oF0_ScKl9"
      },
      "source": [
        "Now we can create embeddings like so:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {
        "id": "4fLOSa59cKl9"
      },
      "outputs": [],
      "source": [
        "import openai\n",
        "\n",
        "embed_model_id = \"text-embedding-ada-002\"\n",
        "\n",
        "res = openai.Embedding.create(\n",
        "    input=[\n",
        "        \"We would have some text to embed here\",\n",
        "        \"And maybe another chunk here too\"\n",
        "    ], engine=embed_model_id\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "rgzqzHM3cKl9",
        "outputId": "4dd0786c-3810-4304-94da-694e3473f82f"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "dict_keys(['object', 'data', 'model', 'usage'])"
            ]
          },
          "execution_count": 13,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "res.keys()"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "6KbZRU2XcKl9"
      },
      "source": [
        "Inside the response `res` we will find a JSON like object containing our new embeddings within the `data` field:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "0UhfV2PWcKl9",
        "outputId": "ec81d686-3acf-49a1-ab22-9d6e8d3fd547"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "2"
            ]
          },
          "execution_count": 14,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "len(res['data'])"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 15,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "3PRI7VdRcKl9",
        "outputId": "7f0a2e9a-d93d-44e4-b4e4-706d5dfeac91"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "(1536, 1536)"
            ]
          },
          "execution_count": 15,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "len(res['data'][0]['embedding']), len(res['data'][1]['embedding'])"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "4K0d29jDcKl9"
      },
      "source": [
        "Each embedding has a dimensionality of `1536`, as this is the embedding dimensionality of the `text-embedding-ada-002` model. We will apply this same embedding logic to the dataset we downloaded before, but before doing so we must create a vector DB index where we can store those embeddings."
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "iQ0MFH5ncKl9"
      },
      "source": [
        "## Creating the Knowledge Base"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "5n8R94bCcKl-"
      },
      "source": [
        "Now we need a place to store these embeddings and enable a efficient vector search through them all. To do that we use Pinecone, we can get a [free API key](https://app.pinecone.io/) and enter it below where we will initialize our connection to Pinecone and create a new index."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 16,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "kl7WXGFxcKl-",
        "outputId": "ff1c0e90-78bb-460d-8ff6-0e8b4d50cdd9"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "WhoAmIResponse(username='c78f2bd', user_label='default', projectname='9a4fbb6')"
            ]
          },
          "execution_count": 16,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "import os\n",
        "from pinecone import Pinecone\n",
        "\n",
        "# initialize connection to pinecone (get API key at app.pinecone.io)\n",
        "api_key = os.environ.get('PINECONE_API_KEY') or 'PINECONE_API_KEY'\n",
        "\n",
        "# configure client\n",
        "pc = Pinecone(api_key=api_key)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now we setup our index specification, this allows us to define the cloud provider and region where we want to deploy our index. You can find a list of all [available providers and regions here](https://docs.pinecone.io/docs/projects)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from pinecone import ServerlessSpec\n",
        "\n",
        "cloud = os.environ.get('PINECONE_CLOUD') or 'aws'\n",
        "region = os.environ.get('PINECONE_REGION') or 'us-east-1'\n",
        "\n",
        "spec = ServerlessSpec(cloud=cloud, region=region)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "as8ZZQsUcKl-"
      },
      "source": [
        "Create the index:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 17,
      "metadata": {
        "id": "e35m6GFHcKl-"
      },
      "outputs": [],
      "source": [
        "index_name = \"nemo-guardrails-rag-with-actions\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 18,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "gVh3zecHcKl-",
        "outputId": "d1627bea-aeca-4926-9975-6f70bb0c5165"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'dimension': 1536,\n",
              " 'index_fullness': 0.0,\n",
              " 'namespaces': {'': {'vector_count': 4838}},\n",
              " 'total_vector_count': 4838}"
            ]
          },
          "execution_count": 18,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "import time\n",
        "\n",
        "# check if index already exists (it shouldn't if this is first time)\n",
        "if index_name not in pc.list_indexes().names():\n",
        "    # if does not exist, create index\n",
        "    pc.create_index(\n",
        "        index_name,\n",
        "        dimension=len(res['data'][0]['embedding']),\n",
        "        metric='cosine',\n",
        "        spec=spec\n",
        "    )\n",
        "    # wait for index to be initialized\n",
        "    while not pc.describe_index(index_name).status['ready']:\n",
        "        time.sleep(1)\n",
        "\n",
        "# connect to index\n",
        "index = pc.Index(index_name)\n",
        "# view index stats\n",
        "index.describe_index_stats()"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "Nj16rfrLcKl-"
      },
      "source": [
        "We can see the index is currently empty with a `total_vector_count` of `0`. We can begin populating it with OpenAI `text-embedding-ada-002` embeddings like so:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 19,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 49,
          "referenced_widgets": [
            "378b3bf651324e8a89cc35dbc5deb75e",
            "4cd31f35980c4adb929c98e01736acfb",
            "12dc71fc026c439bbb35ce783348dde8",
            "4b4fb5bd0ad945c48ca73f7710f9407d",
            "68991c91e49b485791239bb279ead5e5",
            "1da9a2e3371347928189ac6d0dfa47bd",
            "44a1fc7a38de443899e2d44e5153a386",
            "0c24194eeef94e81803cb33ffbc88e18",
            "930220685b6747b7b7a9c6138ad6cf1e",
            "b8d1ddbb52034a2db5c1aa573cda3ab2",
            "a988f01ffb1f41949d0390aba59624bd"
          ]
        },
        "id": "m68xo07QcKl-",
        "outputId": "ec1b29f5-757b-46fc-e752-18d2c59d7512"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "378b3bf651324e8a89cc35dbc5deb75e",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "  0%|          | 0/49 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ],
      "source": [
        "from tqdm.auto import tqdm\n",
        "\n",
        "batch_size = 100  # how many embeddings we create and insert at once\n",
        "\n",
        "for i in tqdm(range(0, len(data), batch_size)):\n",
        "    # find end of batch\n",
        "    i_end = min(len(data), i+batch_size)\n",
        "    batch = data[i:i_end]\n",
        "    # get ids\n",
        "    ids_batch = batch['uid'].to_list()\n",
        "    # get texts to encode\n",
        "    texts = batch['chunk'].to_list()\n",
        "    # create embeddings\n",
        "    res = openai.Embedding.create(input=texts, engine=embed_model_id)\n",
        "    embeds = [record['embedding'] for record in res['data']]\n",
        "    # create metadata\n",
        "    metadata = [{\n",
        "        'chunk': x['chunk'],\n",
        "        'source': x['source']\n",
        "    } for _, x in batch.iterrows()]\n",
        "    to_upsert = list(zip(ids_batch, embeds, metadata))\n",
        "    # upsert to Pinecone\n",
        "    index.upsert(vectors=to_upsert)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## RAG Functions for Guardrails"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "-ZbgLMxAcKl-"
      },
      "source": [
        "Now that we've added all of our text data to the index let's create a \"retrieve function\" that will allow us to take a user query, retrieve relevant records, and return them for use by our LLM.\n",
        "\n",
        "_Note: all functions defined and used with Guardrails `generate_async` must also be async functions._"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 71,
      "metadata": {
        "id": "ulmrh-RucKl-"
      },
      "outputs": [],
      "source": [
        "async def retrieve(query: str) -> list:\n",
        "    # create query embedding\n",
        "    res = openai.Embedding.create(input=[query], engine=embed_model_id)\n",
        "    xq = res['data'][0]['embedding']\n",
        "    # get relevant contexts from pinecone\n",
        "    res = index.query(xq, top_k=5, include_metadata=True)\n",
        "    # get list of retrieved texts\n",
        "    contexts = [x['metadata']['chunk'] for x in res['matches']]\n",
        "    return contexts"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "_FrhDqricKl-"
      },
      "source": [
        "We will create another function to perform the actual retrieval augmented generation (RAG) step given a particular query and contexts."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 72,
      "metadata": {
        "id": "7JVPD749cKl-"
      },
      "outputs": [],
      "source": [
        "async def rag(query: str, contexts: list) -> str:\n",
        "    print(\"> RAG Called\")  # we'll add this so we can see when this is being used\n",
        "    context_str = \"\\n\".join(contexts)\n",
        "    # place query and contexts into RAG prompt\n",
        "    prompt = f\"\"\"You are a helpful assistant, below is a query from a user and\n",
        "    some relevant contexts. Answer the question given the information in those\n",
        "    contexts. If you cannot find the answer to the question, say \"I don't know\".\n",
        "\n",
        "    Contexts:\n",
        "    {context_str}\n",
        "\n",
        "    Query: {query}\n",
        "\n",
        "    Answer: \"\"\"\n",
        "    # generate answer\n",
        "    res = openai.Completion.create(\n",
        "        engine=\"text-davinci-003\",\n",
        "        prompt=prompt,\n",
        "        temperature=0.0,\n",
        "        max_tokens=100\n",
        "    )\n",
        "    return res['choices'][0]['text']"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Guardrails"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We now need to initialize our configs for Rails:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "yaml_content = \"\"\"\n",
        "models:\n",
        "- type: main\n",
        "  engine: openai\n",
        "  model: text-davinci-003\n",
        "\"\"\"\n",
        "\n",
        "rag_colang_content = \"\"\"\n",
        "# define limits\n",
        "define user ask politics\n",
        "    \"what are your political beliefs?\"\n",
        "    \"thoughts on the president?\"\n",
        "    \"left wing\"\n",
        "    \"right wing\"\n",
        "\n",
        "define bot answer politics\n",
        "    \"I'm a shopping assistant, I don't like to talk of politics.\"\n",
        "    \"Sorry I can't talk about politics!\"\n",
        "\n",
        "define flow politics\n",
        "    user ask politics\n",
        "    bot answer politics\n",
        "    bot offer help\n",
        "\n",
        "# define RAG intents and flow\n",
        "define user ask llama\n",
        "    \"tell me about llama 2?\"\n",
        "    \"what is large language model\"\n",
        "    \"where did meta's new model come from?\"\n",
        "    \"how to llama?\"\n",
        "    \"have you ever meta llama?\"\n",
        "\n",
        "define flow llama\n",
        "    user ask llama\n",
        "    $contexts = execute retrieve(query=$last_user_message)\n",
        "    $answer = execute rag(query=$last_user_message, contexts=$contexts)\n",
        "    bot $answer\n",
        "\"\"\""
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "MKb5yYPZcKl_"
      },
      "source": [
        "Note how we have created a user message (`user ask llama`) and flow (`llama`) for handling user queries if they ask about anything Llama 2 / LLM related:\n",
        "\n",
        "```colang\n",
        "define user ask llama\n",
        "    \"tell me about llama 2?\"\n",
        "    \"what is large language model\"\n",
        "    \"where did meta's new model come from?\"\n",
        "    \"how to llama?\"\n",
        "    \"have you ever meta llama?\"\n",
        "\n",
        "define flow llama\n",
        "    user ask llama\n",
        "    $contexts = execute retrieve(query=$last_user_message)\n",
        "    $answer = execute rag(query=$last_user_message, contexts=$contexts)\n",
        "    bot $answer\n",
        "```\n",
        "\n",
        "It executes the `retrieve` action using the `$last_user_message` to get our `$contexts`, we then pass the `$last_user_message` and `$contexts` to our `rag` action. We initialize our RAG-enabled rails with this Colang setup:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 74,
      "metadata": {
        "id": "nTMYxfWqcKl_"
      },
      "outputs": [],
      "source": [
        "from nemoguardrails import LLMRails, RailsConfig\n",
        "\n",
        "# initialize rails config\n",
        "config = RailsConfig.from_content(\n",
        "    colang_content=rag_colang_content,\n",
        "    yaml_content=yaml_content\n",
        ")\n",
        "# create rails\n",
        "rag_rails = LLMRails(config)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "DjhDEGxbcKl_"
      },
      "source": [
        "Remember! We need to register any actions that are used in the Colang config file, otherwise our rails have no idea how to `execute retrieve` or `execute rag`. We register both like so:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 75,
      "metadata": {
        "id": "suCXG0JfcKl_"
      },
      "outputs": [],
      "source": [
        "rag_rails.register_action(action=retrieve, name=\"retrieve\")\n",
        "rag_rails.register_action(action=rag, name=\"rag\")"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "bIj2QY0vcKl_"
      },
      "source": [
        "Now let's try out our RAG agent."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 76,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 35
        },
        "id": "aNL1xLEQcKl_",
        "outputId": "e2fcdeee-f458-4c72-ad3f-47302ce64d95"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            },
            "text/plain": [
              "'Hi there! How can I help you today?'"
            ]
          },
          "execution_count": 76,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "await rag_rails.generate_async(prompt=\"hello\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 77,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 104
        },
        "id": "CZ-fKq5eeUCh",
        "outputId": "96328b30-43d3-44f7-ff02-4cba3e214f5a"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "> RAG Called\n"
          ]
        },
        {
          "data": {
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            },
            "text/plain": [
              "' Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. They are optimized for dialogue use cases and outperform open-source chat models on most benchmarks. They are also on par with some closed-source models, at least on the human evaluations performed. They are intended for commercial and research use in English and can be adapted for a variety of natural language generation tasks.'"
            ]
          },
          "execution_count": 77,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "await rag_rails.generate_async(prompt=\"tell me about llama 2\")"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "Unj5KOEzECC7"
      },
      "source": [
        "We can see from the printed statement of `> RAG Called` that the RAG pipeline was used in our second prompt. However, it'd be interesting to see what the effect of this pipeline is on our actual answer. To do that, let's initialize a new Rails instance that doesn't include the call to RAG, so we can compare the results:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "no_rag_colang_content = \"\"\"\n",
        "# define limits\n",
        "define user ask politics\n",
        "    \"what are your political beliefs?\"\n",
        "    \"thoughts on the president?\"\n",
        "    \"left wing\"\n",
        "    \"right wing\"\n",
        "\n",
        "define bot answer politics\n",
        "    \"I'm a shopping assistant, I don't like to talk of politics.\"\n",
        "    \"Sorry I can't talk about politics!\"\n",
        "\n",
        "define flow politics\n",
        "    user ask politics\n",
        "    bot answer politics\n",
        "    bot offer help\n",
        "\"\"\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 78,
      "metadata": {
        "id": "nH2pdzYRECC7"
      },
      "outputs": [],
      "source": [
        "# initialize rails config\n",
        "config = RailsConfig.from_content(\n",
        "    colang_content=no_rag_colang_content,\n",
        "    yaml_content=yaml_content\n",
        ")\n",
        "# create rails\n",
        "no_rag_rails = LLMRails(config)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "9OECaqz4ECC7"
      },
      "source": [
        "Let's ask the same question:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 79,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 70
        },
        "id": "4dAYiJrKOFXi",
        "outputId": "a60500b0-8138-4cb9-b0a1-e012552104dd"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            },
            "text/plain": [
              "\"Llama 2 is a text-to-speech software developed by NVIDIA. It is designed to generate natural sounding speech from text and is used in a variety of applications such as virtual assistants, chatbots, and automated customer service. The software is powered by NVIDIA's AI platform and uses a deep learning model to generate the audio output.\""
            ]
          },
          "execution_count": 79,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "await no_rag_rails.generate_async(prompt=\"tell me about llama 2\")"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "gGDEQ-6JOMXs"
      },
      "source": [
        "Hmm, not the Llama 2 we were looking for \u2014 let's ask some more questions and compare RAG vs. no-RAG answers."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 80,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 52
        },
        "id": "uIGRkG_2OrQR",
        "outputId": "7a3f38cd-1d4b-433e-8008-fbb529391f32"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            },
            "text/plain": [
              "'Red teaming was used in Llama 2 training to simulate enemy tactics and techniques. This allowed the trainees to practice dealing with realistic threats and build strategies to counter them.'"
            ]
          },
          "execution_count": 80,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "# no RAG\n",
        "await no_rag_rails.generate_async(\n",
        "    prompt=\"what was red teaming used for in llama 2 training?\"\n",
        ")"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "Mr3znSs_O7Lj"
      },
      "source": [
        "Our no RAG rails provide an interesting, but completely wrong answer. Let's try the same with our RAG-enabled rails:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 81,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 70
        },
        "id": "jqQf0nA2OHLj",
        "outputId": "06898bea-1340-48a4-d95f-e9b0e51fd591"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "> RAG Called\n"
          ]
        },
        {
          "data": {
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            },
            "text/plain": [
              "' Red teaming was used to identify risks and to measure the robustness of the model with respect to a red teaming exercise executed by a set of experts. It was also used to provide qualitative insights to recognize and target specific patterns in a more comprehensive way.'"
            ]
          },
          "execution_count": 81,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "# with RAG\n",
        "await rag_rails.generate_async(\n",
        "    prompt=\"what was red teaming used for in llama 2 training?\"\n",
        ")"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "iAVLtEXSPI3E"
      },
      "source": [
        "A perfect answer! Clearly, our RAG-enabled rail is far more capable of answering questions, while only calling the RAG action when required as set in our `actions.co` config file. We can confirm by asking more questions, we should see the printed statement `\"> RAG Called\"` will not appear unless the question is Llama 2 / LLM related:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 82,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 35
        },
        "id": "W-9XBJNVPFEs",
        "outputId": "510d89ce-3444-4ec2-893d-a13cbfbf3890"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            },
            "text/plain": [
              "'The sky is typically blue, but can change depending on atmospheric conditions and time of day.'"
            ]
          },
          "execution_count": 82,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "await rag_rails.generate_async(\n",
        "    prompt=\"what color is the sky?\"\n",
        ")"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "WW0OR-goQHCk"
      },
      "source": [
        "By using Guardrails for RAG in this way we manage to create a balance between the lightweight but naive approach of implementing [RAG with _every_ user call](https://www.pinecone.io/learn/series/langchain/langchain-retrieval-augmentation/) vs. the heavyweight and slow approach of implementing a [conversational agent with RAG tool access](https://www.pinecone.io/learn/series/langchain/langchain-agents/)."
      ]
    }
  ],
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "redacre",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.9.12"
    },
    "orig_nbformat": 4
  },
  "nbformat": 4,
  "nbformat_minor": 0
}